Using Some Web Content Mining Techniques for Arabic Text Classification
نویسنده
چکیده
With the massive rise in the volume of information available on the World Wide Web these days, and the emergence requirements for a superior technique to access this information, there has been a strong resurgence of interest in web mining research. Web mining is a critical issue in data mining as well as other information process techniques to the World Wide Web to discover useful patterns. People can take advantage of these patterns to access the World Wide Web more efficiently. Web mining can be divided into three categories such as content mining, usage mining, and structure mining. In this paper we are going to apply web content mining to extract non-English knowledge from the web. We will investigate and evaluate some common methods; using web mining systems which have to deal with issues in language-specific text processing. Arabic language-independent algorithm will be used as a machine learning system. The algorithm will use a set of features as a vector of keywords for the learning process to apply text classification for the system. The algorithm usually used to classify a various number of documents written in a non English text language. The techniques used in the algorithm to categorize and classified the documents are two classifiers: Classifier K-Nearest Neighbor (CK-NN) and Classifier Naïve Bayes (CNB). However, the algorithms usually depend on some phrase segmentation and extraction programs to generate a set of features or keywords to represent the retrieved web documents. A proposed Arabic text classification system will be called Arabic Text Classifier (ATC). The main goal of ATC is to compares the results between both classifiers used (CKNN, CNB) and select the best average accuracy result rates to start a retrieving process. The theorem behind the ATC was introduced in this paper without demonstrating any practical views of the system.
منابع مشابه
Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents
Besides for its own merits, text classification (TC) has become a cornerstone in many applications. Work presented here is part of and a pre-requisite for a project we have overtaken to create a corpus for the Arabic text process. It is an attempt to create modules automatically that would help speed up the process of classification for any text categorization task. It also serves as a tool for...
متن کاملThe Effect of Stemming on Arabic Text Classification: An Empirical Study
The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to E...
متن کاملCross Language Information Retrieval Model For Discovering WSDL Documents Using Arabic Language Query
Web service discovery is the process of finding a suitable Web service for a given user’s query through analyzing the web service‘s WSDL content and finding the best match for the user’s query. The service query should be written in the same language of the WSDL, for example English. Cross Language Information Retrieval techniques does not exist in the web service discovery process. The absence...
متن کاملA High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملTowards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore
Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...
متن کامل